perm filename HTSWTS.MRC[UP,DOC]9 blob
sn#688000 filedate 1982-11-08 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00006 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 .DEVICE XGP
C00003 00003 ←%3How to start WAITS
C00007 00004 %2FIXING THE SYSTEM:%1
C00012 00005 %2RELOADING:%1
C00027 00006 %2RESTARTING THE KA-10:%1
C00030 ENDMK
C⊗;
.DEVICE XGP
.!XGPCOMMANDS←"/PMAR=0";
.!XGPLFTMAR←216;
.PAGE FRAME 999 HIGH 80 WIDE
.AREA TEXT LINES 1 TO 999 CHARS 1 TO 80
.PLACE TEXT
.FONT 1 "BASL30";
.FONT 2 "BASI30";
.FONT 3 "BUCK75";
.FONT 4 "FIX25";
.FONT 5 "FIX13X";
.TURN ON "←{%α↓_#"
.AT "ffi" ⊂ IF THISFONT ≤ 2 THEN "≠" ELSE "fαfαi" ⊃;
.AT "ffl" ⊂ IF THISFONT ≤ 2 THEN "α∞" ELSE "fαfαl" ⊃;
.AT "ff" ⊂ IF THISFONT ≤ 2 THEN "≥" ELSE "fαf" ⊃;
.AT "fi" ⊂ IF THISFONT ≤ 2 THEN "α≡" ELSE "fαi" ⊃;
.AT "fl" ⊂ IF THISFONT ≤ 2 THEN "∨" ELSE "fαl" ⊃;
←%3How to start WAITS
%2FIND A WIZARD:%1
.BEGIN INDENT 5
Before you do anything, you should try to find a wizard. Maybe there is
already one working on the problem -- if so, he will be very angry if you
disturb the machine. Check in the 030 block of offices for ME in 030d;
or, for network problems look for Tovar on the 4th floor; or, as a last
resort, call REG's office (7-3236, room 428). If necessary, call a
wizard at home (but not in the middle of the night unless it is an
%2urgent%* problem, as defined below).
Home phone numbers of wizards:
.BEGIN NOFILL; INDENT 0;
%4
Martin Frost (ME) (11am → 11pm) (9)329-9081 or (9)325-8507 or 21-192 (beeper)
Tovar (TVR) (network problems only) 7-4971 or (9)327-3622
.END
.BEGIN INDENT 0;
(%1Do %2not%* use 7-4975 to call out on, since that is the number that a
wizard might be trying to call in on. If it rings, %2answer it%*.)
%1For an %2urgent%* problem you can beep ME at %2any%* time, %2but it better
really be urgent!%* A problem is %2urgent%* if it is continuing or
repeating, e.g., you can't reload, or a similar emergency. %2A#plain
system crash doesn't count as urgent%*, unless you are unable to reload
after following the reloading instructions below. Even wizards don't like
being awakened in the middle of the night.
To call by beeper, dial the 21-xxx number, wait for the beeping to stop,
and then %2describe the problem in 10 seconds or less%*. Your message is
transmitted right then by radio to the beepee.
.END
Sometimes a wizard will dial up the CTY to fix things from home, (possibly
without your knowing it). When this happens, he may need some local help
from you. %2Stand by%1 in case he asks you to do something like check
status lights.
If you can't get in touch with a wizard, you'll have to fix it yourself;
see the instructions below. After you fix it, make a note in the log with
the date, time and description of the failure (include any message typed
out on the CTY). %2Sign your log note (with your SAIL programmer name, if
any)%1.
.END
**********************************************************************
.SKIP 1
%2FIXING THE SYSTEM:%1
.BEGIN INDENT 5
Many crashes are bug traps and will print a message %2followed%* by:
←%4Find a WIZARD or type "$P". $ means ESC. You're in DDT.%1
If it prints this and you can't find a wizard, try typing [ESC] %4P%1 and
a couple of [RETURN]s. If you get monitor dots, you're in luck. Type
%4BEEP%1 and [RETURN] to tell everybody the good news. %2Don't forget to
log the crash!%1
If after you type %4$P%1 the same thing happens, try %4$P%1 again. If it
happens repeatedly, you'll have to reload, so go to step 1 below. Certain
errors, like %4Page Fail, PI in Progress%1, require a wizard's
intervention; without help, the routine for such an error will just retry
the losing instruction, which naturally will fail in the same way again.
Routines for other errors are able to fix the problem or bypass it and get
the system running again automatically when you type %4$P%1. So the thing
to do is to try %4$P%1 a few times (if once doesn't fix things) before you
give up and reload (but always try to find a wizard before typing %4$P%1
even once).
If the system gets a %4NXM%1 (non-existent memory error), you may have to
reset a hung memory (reloading won't work). See step 200 below.
Sometimes even resetting the memory won't help; in that case the memories
may have to be reconfigured or fixed. You should leave that for a wizard
to do.
If the machine has powered itself off, then the %4FAULT%1 light will be on
on the KL-10's console PDP-11 front panel (where it says "KL-10", that's
really a PDP-11). Usually this indicates an air-flow problem in the cpu
or a tripped circuit breaker. The cause of the fault will be indicated by
one (or more) of several indicator lights inside the back of the console
PDP-11 cabinet, at the bottom. Before doing anything else, you should see
which indicator lights are on back there. Usually it is %4AIR FLOW CPU%1
or %4CKT BKR TRIP%1. Log the problem before continuing. Then try
very hard to find a wizard. Do %2NOT%1 power the system back on unless a
wizard tells you to do so!
If one of the messages %4?10 CLKOP%1 or %4?10 TTI%1 was printed on the
CTY, a memory is probably losing (hung or powered off), or the microcode
may be hung; try the command %4MC%* and [RETURN] to see if that helps. If
not, you may have to reset a memory (see step 200 below) and/or reload
(but first, seek a wizard!).
The message %4?10 CMD ERR%1 usually means that the wrong version of %4KLDCP%* is
loaded; see step 104 below.
If no message was printed, or a message was printed which doesn't look like any
of those above, you will probably have to reload (if you can't find a
wizard).
.END
**********************************************************************
.SKIP 1
%2RELOADING:%1
.BEGIN INDENT 5
1. If there has been a power failure, go to step 100.
2. Type %4↑X%1 (ie, hold down %4CTRL%1 and type %4X%1). The response
should be %4KLDCP%1 (or else it may echo simply as %4↑X%*). If the
command typed in the next step doesn't echo, try this step again; then if
typing the next step's command still doesn't seem to work, go to step 100.
3. Type %4SP%* and [RETURN]. This stops the KL-10 and records useful
information, including the PC, on the CTY for later perusal by a wizard.
If this command gets you the message %4?UCODE HUNG%*, then type the
command %4ALL%* and [RETURN]; this logs a few lines of information
so a wizard can figure out how the microcode was hung. In either case,
go on to step 4 next.
4. Type %4DS%1 and [RETURN]. If %4DS%1 gives you the proper response of
%4DSKDMP%1 and a star (%4*%1), go to step 5. If you get the message
%4LOAD DSKDMP - USE LD%1, then you'll have to load the DSKDMP bootstrapper
from DECtape into the PDP-11 by typing the command %4LD BOOT%1 and
[RETURN] (if the system is running with only one disk controller, then the
command to use here is %4LD NBOOT1%1; normally two disk controllers are in
use). After doing %4LD BOOT%1, start step 4 over again. If you get the
message %4DEX ERROR IN DS%1, then either there is a hung memory which
needs to be reset (see step 200) or the KL's memory adapter needs to be
configured (see step 102); perform the indicated step (200 or 102) and
return to the beginning of step 4. If you still get %4DEX ERROR%1 after
performing both steps 200 and 102, then there is probably a failing memory
and you'll have to get help from a wizard. If %4DS%* gets no response at
all (except maybe %4>.%*), try resetting the C1 disk channel (next to the
KL-10, on the end) by pushing its RESET button inside its front door. If
you've tried all of the appropriate suggestions in this step and %4DS%*
still fails, go to step 100.
5. Type %4WAITS%1 and [RETURN]. If the system reloads and starts you are
winning. If the system doesn't start, you must get help. If (and only
if!) the CTY says %4?10 CMD ERROR%1 at this point (which it will if you
reloaded KLDCP from DECtape in step 100c), then you will have to perform
step 104 to reload the proper version of KLDCP. In any case, %2don't
forget to log the cause of the crash and the reload!%1
.END
%2Don't come here unless directed to by the steps above.%1
.BEGIN INDENT 5
100. KLDCP is the PDP-11 console program. It prompts with "%4>.%1"
(a greater-than sign and a dot). By typing carriage return, you should be
able to get another such prompt. If so, KLDCP is running; go to step 101.
If you don't get the KLDCP prompt, continue here with 100a.
100a. Try restarting KLDCP: set 100014 in the PDP-11 switches (switches
15, 3 and 2 up, all the rest down); push HALT/ENABLE down and then restore
it; push LOAD ADDRESS down and restore it; press START. KLDCP should
respond with a prompt; if so, go to 101, else 100b.
100b. Try restarting KLDCP again, this time with 100004 in the address
switches (bits 15 and 2 up). If you get the KLDCP prompt, go to step 101;
otherwise try one more starting address, namely 100010 (bits 15 and 3 up).
If this finally works, go to 101, else go to 100c.
100c. If restarting KLDCP fails, KLDCP must be reloaded from DECtape.
Note that the DECtape contains an OLD VERSION of KLDCP. The new version
will have to be loaded later (from disk) in step 104. Make sure a DECtape
labeled "KL10 bootstrap" is mounted on a PDP-11 DECtape drive that is
selected to unit 0 and is enabled for "remote" (i.e., computer) operation.
Press the "LOAD DECTAPE" button (located above and to the left of the red
"Emergency Power Off" button) and hold it for at least a slow count to
one. The DECtape should spin and eventually something like %4TCDP
monitor%1 should be typed. Type in %4KLDCP%1 and [RETURN]. KLDCP should
load and type a message -- go to step 101. If you don't get to TCDP you
might try pressing the LOAD DECTAPE button again. If you get to TCDP and
the %4KLDCP%1 command doesn't work, get help.
101. If there's been a power failure you will have to reload the
KL10's microcode. Type %4DT0 LR SU%1 and [RETURN] to KLDCP. If any
error messages are printed, try again. If you can't get the microcode
loaded without errors, get help. After successfully loading the
microcode, type#%4SW#3#600000%1 and [RETURN] to set the switch register
to its normal value. After reloading the microcode, you should
configure the memory in step 102.
102. Make sure the PDP-11 data switches are all back down (normal
position), and now configure the memory adapter by typing %4I X4%1 and
[RETURN]. This will type out about 30 command lines as they are being
executed; it will end by typing something like KL STOPPED.
103. Now try steps 4 and 5. If %4DS%1 still doesn't work (and you've
done an %4LD BOOT%1 if called for), then something major is probably
broken; get help.
104. Reload the right version of KLDCP (with the system already running,
e.g., after step 5 is successful). If you had to load KLDCP from DECtape
in step 100c, then when the system runs, the 11 may halt with the message
%4?10 CMD ERR%1. In this case, the correct version of KLDCP must be
loaded, but the system should be partially usable anyway. If the CTY is
working (i.e., the console-11 is somewhat happy), then you can just type
%411LOAD%* on the CTY. Normally the CTY is not usable with the system if
KLDCP is not happy, so a suitably privileged user must log in and incant
either
.BEGIN SELECT 4; no fill; no just; SKIP;
11LOAD
%1or%*
RUN 11LOAD[KL,SYS]
AGRONK
KLDCP.L11[KL,SYS]
.END
.END
.SKIP
%2Don't come here unless explicitly directed to by above instructions.%*
.BEGIN INDENT 5;
200. Resetting hung memories. If some error condition such as %4DEX ERROR
IN DS%1 or %4NXM%1 indicates that there is probably a hung memory, then
the memory needs to be reset. There are two types of memories: the MG
memory (in two identical cabinets labelled, in the upper left corners,
MGB and MGA) and the ARM-10M
memory (in one cabinet to the right of the two MG boxes). The three memory
cabinets are located in the row behind the KL-10. Before resetting a
memory, you should attempt to see if it is hung. This is done differently
for the two different types of memory. (If one or more of the three
memory boxes has %2no lights on%*, then that box has probably turned
itself off -- in that case, find a wizard rather than trying to fix it
yourself!)
201. MGA and MGB: Each of these cabinets has an array of lights at the
top. The bottom two rows in this array indicate the status of the two
controllers (cont 0 and cont 1) within each cabinet. So there are four
controllers to check for being hung (or for having parity errors). On
each controller's row of status lights, there is at the left end a light
labelled %4UA%1 (for Unit Available); if this light is out, the controller
is hung. On the right end of each row of status lights is a light
labelled %4PAR ERR%1; if this light is on, then that controller has seen a
parity error. If you notice a parity error, you should record it and also
record which of the %4RD%1 (read) and %4WR%1 (write) lights at the other
end of that row is on. If you find a hung MG controller, you should do
the following to reset it: (a) first push the RESET button on the front
of the C1 Disk Channel (next to the KL-10), and (b) then push the RESET
button on the bottom front of the MG that was hung (in each case, you must
open the magnetic door to get at the RESET button). DO NOT RESET THE
MEMORY SIMPLY BECAUSE YOU FIND A PARITY ERROR LIGHT ON! The parity error
light is simply a flag and does not affect memory operation.
202. ARM-10M: This memory has four HUNG lights at the bottom of the main
array of lights (visible through the window). The four HUNG lights are
spread out, one for each sector, and each one is next to a RESET switch.
(You may not notice the lights if none is on, because of the dark
background, but you should see the word HUNG above a blank space where the
light really is, next to a RESET switch.) The ARM-10M also has parity
error lights, in a line of four, one for each sector, labelled SECTOR
PARITY ERROR. And just above those lights are four others labelled SECTOR
CONTROL ERROR. Before resetting a hung memory, you should note whether
any of the parity error or control error lights are on. To reset a hung
sector, push the RESET button next to the HUNG light that is on (you do NOT
need to reset the C1 before resetting the ARM-10M). Again, NEVER RESET A
MEMORY JUST BECAUSE IT HAS A PARITY ERROR LIGHT ON! You only need reset a
memory if it is actually hung. If the ARM-10M is hung, you will end up
having to restart the KA-10 (after you get the system running again).
.END
**********************************************************************
.SKIP 1
%2RESTARTING THE KA-10:%1
.BEGIN INDENT 5
These instructions are for restarting the KA-10, which is the secondary
processor. They assume that the main timesharing system itself is
running; presumably you are reading this because you were told to restart
the KA-10 by an XGP spooling of yours or by WAITS when you reloaded. As
always, make sure a wizard is not already working on it.
The KA-10 is the black computer (the KL-10 is blue). Its console
panel should be about four feet to the left of this sheet. It will
have a lot of lights and switches on it.
To %2restart%1 the KA, first check the KA's address switches to be sure
that they are set to 204. If you don't know how to do this, don't worry
since its switches should always be set to 204. In addition, if the KA
stopped with a memory stop make a log entry with details of the memory
lossage (the note on the KA console says how to do this).
Now, restart the KA by pressing the RESET button on the KA-10 console
panel, then pressing the START button. You should get a message on the
teletype next to the KA saying something like %4KA10 RESTARTED%1.
If you don't get this message, or if it is followed by some other message
that looks like an error message, try reloading the KA-10 (%2not%1 the
regular system!!).
To %2reload%1 the KA, press the KA-10 RESET button again. Go to the KL-10
CTY and type %4P2LOAD%1 and [RETURN]. When it finishes, press the START
button on the KA-10 console panel. If it doesn't work now, get help.
.END
**********************************************************************
.SKIP 1
The PUB source for this file is %4HTSWTS.MRC[UP,DOC]%1. Corrections
marked on this sheet will be noted therein.